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A review of coding theory 


by Brian P. McArdle 


1. Introduction 


The general area of Coding Theory for telecommunications and 
computer applications is reviewed to provide a simple introduction 
to the subject. For further information, the reader should consult the 
books in the reference section. 

There is no formal definition of a code. Essentially, messages are 
represented in some form more easily transmitted than normal writ- 
ten language. In this article, a code is a digital electronic signal that 
represents a message symbol, such as a letter or number. For exam- 
ple, a teleprinter code would have to have a signal for every possi- 
ble symbol (26 letters, 10 numerals and other symbols) and signals 
for every operation (that is, space, carriage return and line feed con- 
trols). Figure | shows the arrangement. 
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Fig. 1. Encoding / decoding operation. 


An encoding operation E turns a message symbol aj into coded 
form for transmission over a channel. The set {aj} is the source al- 
phabet and { Pr(aj)} is the set of probabilities associated with this al- 
phabet: Pr(aj) is the probability that a; occurs. In normal language, 
this is the probability of occurrence of letters. The word ‘channel’ 
has a general meaning. Itcould be acable, radio link or storage medium 
where the receiver is retrieving the messages at some later time. 
Obviously, the receiver must be able to apply a decoding operation 
E-!. Hence, the principal requirement for a satisfactory code is that 
the coded symbols be uniquely decodable. In mathematical terms, 

E[a;| cannot represent more than one symbol. Ela; |cannot equal E[a; | 
unlessa; andajare effectively the same symbol. Forexample, E might 
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Fig. 2. Partition of the set of coded symbols. 


not distinguish between upper and lower case letters. The decoded 
messages may be printed in upper case letters only. Thus, apart 
from small variations that should not affect normal understanding, 
the encoding operation, irrespective of its complexity or purpose, 
must be exactly and uniquely reversible. 

A more formal mathematical definition is that the set of coded 
symbols (Ela;)} must be uniquely partitioned (that is, can be di- 
vided into subsets that do not overlap) such that each partition can 
be associated uniquely with a source symbol. Figure 2 shows the ar- 
rangement. 

The remainder of this article attempts to explain the meaning of 
E in different applications, such as error-detection-correction and 
encryption. Itis always assumed that the encoding operation is uniquely 
reversible unless otherwise stated. Another important assumption is 
that a memory-less source is involved. The probabilities Pr(aj)s are 
the probabilities of the general occurrence of these symbols in nor- 
mal language. In reality, letters occur in groups (digraphs and tri- 
graphs). / before e, except before c is a well-known expression. 
Consequently, the probability of occurrence of a particular letter could 
be influenced by preceding letters. It is also assumed that a mem- 
ory-less source is being considered unless otherwise stated. The 
analysis is mostly confined to digital coding except for Section 6, 
which deals with coding for analogue signalling. 


2. Different codes 
Codes can be analysed from many different viewpoints, but engineers 
are generally concerned only with two main categories. 


(a) Fixed length codes 

Every character of such a code is represented by a block of bits with 
every block having the same length. A typical example would be a 
computer code, such as ASCII or EBCDIC. Both of these use blocks 
of eight bits. Thus, there is a total of 25 or 256 difference blocks. 
Any two blocks of the same code would have to differ in at least 
one bit. The blocks need not be symbols (letters, numbers, punctu- 
ation marks) but can be controls (carriage return, line feed, etc.). 


(b) Variable length codes 

Consider an alphabet of five symbols {a, b, c, d, e}. In (a), this 
would require a code of three bits per block or symbol with 23—S = 3 
redundant blocks. However, if the following arrangement of three 
blocks of two bits per block and two blocks of three bits per block 
is used, 


a=00,b=01,c=10,d=110,¢= 111, 


the average length of a message would be reduced. Blocks still have 
a specific length, but it is no longer the same fixed length, The basic 
requirement for unique decoding must still be maintained. For ex- 
ample, the bit sequence 011100011110 is easily decoded to bdaec 
with no errors. This must apply for all combinations of the symbols, 
An important quantity is the average length L of a block given by 


L= >», T 
j 


[Eq. 1] 
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where pj is the probability of occurrence of block type j with 7; bits. 
Ideally, this should be as small as possible to minimize the total 
number of bits per message. To ensure this requirement, the large 
probabilities would be paired with the smaller blocks. Morse code 
is another example where the common letter ¢ is a dot. but z is two 
dashes followed by two dots. 

This particular example has a special significance in addition to 
variable length blocks. If it is rewritten in the form of a diagram as 
below, it seems to have a tree-type structure with different branches. 


Each branch is terminated by a symbol. The branches join together 
at nodal points which do not in themselves represent symbols. This 
arrangement indicates an jnstantaneous code. This means that the 
decoding operation does not require a “memory. that is, it does not 
refer to blocks before or after any block that is being decoded. In 
the decoding of bdaec, it was not necessary to test the 3rd bit before 
deciding that the first two bits. 01, represented b. This property re- 
mained true for the full operation and for all decoding operations ir- 
respective of the combinations of symbols. (This should not be con- 
fused with a memory-less source, defined earlier, where there is no 
relationship between the occurrence of different symbols.). In an 
instantaneous code, no block can be a prefix or suffix for another 
block. Huffman codes, which are too involved to be considered ina 
simple overview, come into this category. However, it must be em- 
phasized that any collection of blocks of varying lengths does not 
make an instantaneous code. There is a specific requirement given 
by the Kraft inequality 


Yi-s 
7 =] 
to form such a code. Further analysis is outside the scope of this 


paper and the reader is referred to the Reference Section for further 
study. 


3. Information theory 


Information theory has steadily increased its profile over the past 
few years and it is no longer possible to study telecommunications, 
especially coding, without touching on it somewhere. At first glance, 
the ideas behind it can appear too general and abstract for simple. 
direct applications. The fundamental fact is that the basic concepts 
of entropy, equivocation and channel capacity come from informa- 
tion theory, which, in turn, has influenced coding theory, and re- 
quire some explanation. 

There is a fundamental difference between an electronic signal and 
its value as information, In sound broadcasting, un unmodulated 
carrier would not convey any programme content to a listener. 
Therefore. there is a need to be able to quantify the value of a sig- 
nal as information. In the 1920s, Hartley put forward the idea that 
the /ogarithmic function could be used as a measure of information. 
This was one of the landmarks in information theory. If two messages, 
a; and aj. are independent, 

log| Pria,) and Pria;)}=log | Pria;)}+log | Pria;) | [Eq. 2] 
and the base 2 is normally used. Remember that ‘log’ ts not a linear 
function. The idea that the information contents of two independent 
messages is simply the sum of the information of each separate 
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message seems instinctively correct. However, this method of mea- 
suring information has no connection with an actual signalling sys- 
tem. The entropy for A=(aj} is given by 


H( A) = -¥ pri a )logPr (a) [Eq, 3] 


and is a measure of the average information. An alternative expla- 
nation, which has become more common in recent years, is that it is 
a measure of the uncertainty in the information. H(A)=0 means that 
Pr(a;)=1 and Pr(a,)=0 for all other messages. Consequently, there 
is no doubt about the message. The maximum value occurs when 
the probabilities are the same and all messages are equally likely. If 
there are 1 possible messages, H(A) is between 0 and log(). The di- 
mension is information bits per symbol. 

Toapply information theory to coding theory, consider Fig. 3 where 
there is a noisy channel between sender A and receiver B. The joint 
entropy is given by the equation 

H(A.B)=H(A)+H(B/A) (Eq. 4] 
where H(B/A) is known as the conditional entropy or eguivocation. 
This in turn is defined as 


H(B/ A)= YH B/ a) Pr(a_), 


(Eq. 5] 


In non-mathematical terms, H(B/A) is a measure of the informa- 
tion loss in transmission, The channel capacity is given by 

C(A,B)=maximum H(A)-H(A/B). [Eq. 6] 
This appears correct because the limit on the information conveyed 
over a channel is determined by the original uncertainty of that in- 
formation (before reception) reduced by the uncertainty after recep- 
tion. Essential capacity is limited only by noise and the Hartley- 
Shannon law sets an upper limit of Wlog( 1+S/N), where W is the in- 
formation bandwidth, § is the signal power and N is the noise power. 
For technical reasons, present-day systems operate well below this 
limit. The reader is referred to the Reference Section for further study. 
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Fig. 3. Communications channel. 


From the point of coding and electronic engineering, Eq. 6 can 
be simplified for normal use. Consider the case for a binary channel 
where A and B represent the input and output respectively. In gen- 
eral, the probability ofa *O' or“ 1" is + />, which gives H(A)=log(2)=1. 
(The entropy of the source alphabet could be computed from the prob- 
ability of occurrence of the various symbols, but it is the channel 
that is now under consideration.). If p is the probability of an error 
where a ‘0" is received as a “1”, or vice versa, the channel condi- 
tional probabilities are 


From Eq. 5: 


H(A.B)=Pr(0)[-{ 1—p)log( l—p)-plog(p)|+ 


Pr(1)[-(1-p)log( |-p)-plog(p)] [Eq. 7] 


which gives a new expression for the channel capacity: 
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C(A,B)=1+plog(p)+(1—p)log(1—p) [Eq. 8] 
in bits per symbo/. This is the usual expression in most textbooks on 
telecommunications, If the signalling rate is R symbols per second, 
the right-hand side is multiplied by R to give bits per second. Thus, 
information theory can be useful in the analysis of codes. The entire 
area has become extensive and has been treated only superficially 
here, 


4. Error detection and correction 


Error detection und correction is one of the main applications of 
coding theory and purulleled its development. Figure 3 showed the 
problems with errors where a “0° can be received as a *!” or vice 
versa. The use of the word ‘receiver’ is general in that it could rep- 
resent a storage medium, and so on. It suffices to say that data is 
corrupted, which limits its value upon reproduction. Section 3 
demonstrated that channel capacity is limited only by noise. To re- 
duce the effects of errors, and therefore noise, extra bits are added 
to a block of data bits to create new and larger blocks, which im turn 
allow errors to be identified. 
Consider the (7.4) Hamming cade as follows: 


C4=(d>+dg+ds) mod 2 
c7=(d>+d_+d3) mod 2 


Position:7 6 5 4321 


Bit: dy dg ds cy dy Co cy 


c¢)=(d>+d5+d;) mod 2 


There are three check bits in positions 1, 2 and 4 which have 
been derived from the datu bits in the other four positions. The code 
is linear in the sense that the check bits are linear combinations of 
the data bits and the encoding operation is simply the application of 
the three linear equations, Since every block will have a total of 
seven bits without exception, the code is in the fixed length cate- 
gory. The position of the check bits within the block is very signif- 
icant. A receiver generates the check bits from the received data 
bits and applies the decoding rule in Appendix 1. For example, if ds 
has been altered, c; and c» will not be validated and so on. The ar- 
rangement to check five data bits is given in Appendix 2. In both 
these examples, the set of coded blocks is such that the minimum 
variation between any two blocks in the same set is three bits. This 
is known as the Hamnung distance. The reader is referred to Reference 
2 for a more detailed explanation, The main point to note is that the 
method identifies only one error per block. In general, r check bits 
have (2/1) possible combinations and thus r bits in a total size of n 
bits must satisfy the condition (2/—1)2” in order to identify and 
therefore correct one error. To correct two or more errors per block, 
a code with a larger Hamming distance and more complicated ar- 
rangement would be needed. 

Cyclie codes are the most commonly used for error detection and 
correction. These are also of the fixed length variety. For a block of 
total size n, the check bits are produced by a generator polynomial 
which is a factor of (+1). A typical example is the specification 
MPT 1317 for the transmission of data over radio links. The format 
is as follows: 


DATA | CHECK PARITY 
Bit 64 63. .62°....17] 16 55..2 l 


with a block size of 64 bits. However, the first bit is for parity and 
18 generated by the other 63 bits, The 15 check bits are generated 
from the 48 data bits using the generator polynomial 

elyya Dante Berl 4 cer2+l [Eq. 9] 
which ts a factor of (x®3+1), Refer to Appendix 3 for an exact break- 
down. The data bits are the coefficients of the terms? down to x!5 
inclusive. Some books write the data bits on the right-hand side of 
the format, but this is not important provided they represent the 
high power terms of the polynomial. The polynomial consisting of 
only the data bits ts divided by gt). The remainder is then added 


back to produce a new polynomial such that g(x) is now a factor of 
the new polynomial. Since the check bits are essentially the original 
remainder, they represent the terms x!4 down to .x®, Then a parity bit 
is added in order to detect odd numbers of Is and the full 64-bit 
block should have even parity. Refer to Appendix 4 for the genera- 
tion of a parity bit. The overall result is that the code can identify 
and correct up to four errors per block. This is a considerable im- 
provement on the (7.4) Hamming code, but the operation is much 
more involved and the block size nine times larger. To check for er- 
rors, the receiver divides the polynomial by g(x) and there should 
be no remainder. 

Another example is the POCSAG code for paging. which uses 
the format 


DATA 
32 31 Ds....12 


and the generator polynomial 


glx)= rl Op +r84 64540341 [Eq. 10] 
which is a factor of (x3!+1). Referto Appendix 5 for an exact break- 
down, The overall method is the same with the 21 data bits gener- 
ating the 10 check bits to produce a 31-bit block plus an extra par- 
ity bit. 


5. Encryption 


In encryption, the E operation, defined in Section 1, represents a se- 
crecy operation and is usually written as Ex in most textbooks. The 
parameter K is known as the key and its purpose is to vary the op- 
eration. This is in complete contrast to error-detection-correction 
where exactly the same operation is performed on all blocks with- 
out exception. The importance of K is that it is generally the part of 
Ex that is kept secret. In a publicly known algorithm, such as the 
Data Encryption Standard (DES), the complete algorithm is known. 
A user chooses a key from the set of possible keys {K} and encrypts 
the data. Thus, only encrypted data appears on the channel of Fig. 1. 
The data can be recovered by the inverse or decryption operation Ex! 
which also requires the correct key. If the key in use is kept secret 
and only known to authorized receivers, the data is kept secret. 
Obviously, |X} must be sufficiently large to prevent an unautho- 
rized user from trying each key in turn. There are a number of other 
requirements that are outside the scope of this paper. 


There are three main methods of encryption, 


(a) Stream encryption 

In Fig. 4, each bit of the data is added modulo 2 using an XOR logic 
operation. A sequence of key bits is produced by the key generator 
such that each data bit is encrypted by its own particular key bil. 


(d4+k)mod2 


Fig. 4. Stream encryption. 


The authorized receiver must know the method of key generation to 
reproduce the exact same sequence. The inverse operation 1s simply 
to apply the key sequence in the correct order to the sequence of en- 
crypted bits. It would be too complicated to discuss the various 
techniques of key generation, but the most common method uses 
shifi registers to generate a pseudo-random binary sequence. Generally, 
part of this process must be kept secret, such as the number of stages 
and feedback arrangements. The current proposals to provide en- 
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cryplion facilities on the cellular system GSM or for digital short- 
range radio DSRR are believed to use a form of stream encryption. 
However. the information is confidential and it is very likely that 
the exact method will not be made public. 


(b) Block encryption 

Block encryption—see Fig. 5—differs from stream encryption in that 
a block is encrypted as a single unit. The most widely Known method 
is the US Data Encryption Standard, which uses blocks of 64 bits 
for input and output. The algorithm was published in 1977 and the 
exact method is for public information. The actual Key is a 56-bit 
block. so that the number of possible keys is 25. In operation, the 
authorized receiver would know the particular key in use and apply 
the inverse or decryption algorithm. Controversy has always sur- 
rounded the key size and recent articles have suggested methods for 
an unproved DES. 


inpul Block | Operation | Output Block 


Key 
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Fig. 5. Block encryption. 


The main advantage over (a) is that a satisfactory block opera- 
lion creates interdependence between the bits of a block. If one bil 
of an input block is varied, a number of bits in the output block are 
altered. However. it is generally much slower than stream encryp- 
tion and cannot be used for high-speed telecommunications appli- 
canons. 


(c) Public key encryption 
Public key encryption differs from the methods in (a) and (b) in that 
part of the key is made public. whence its name. The main require- 
ment is that the part which must remain secret should not be easily 
deducible from the public part. A typical example is the RSA method 
introduced in 1978. Each user publishes two numbers, NV and e. N is 
very large. of the order of 80 digits, and the product of two primes, 
P and Q, while e and i satisfy the equation 
l=ed mod (P-1)(Q-1). Eq. 11] 
Only ¢ is made public: d, P and Q remain secret. [fuser A wishes to 
forward the message ‘a’ to user B. A looks up the parameters V and 
¢ for B and transmits 


b=a* mod N. [Eq, 12] 
User B recovers the original message trom 
a=b/ mod N: [Eq. 13] 


since d is one of the secret parameters, this cannot be done by any 
other user. From a secrecy point of view. an unauthorized user 
would have to factor NV into P and O to calculate d, Thus. as long as 
N is sutficiently large, this is impractical and the method is secure. 
There are other methods, such as the Merkle-Hellman-Knapsack 
Method, but they all follow the same principle of a public and a pri- 
vate key. Equations 12 and 13 are the equivalents of the encoding 
and decoding operations. 


6. Coding for analogue signalling 
In the preceding four sections, it was assumed thal digital signal 
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processing was in use. However, codes are also used in analogue 
electronics. but their application is rather limited. For example, in 
the PMR service, CTCSS (continuous tone controlled signalling 
system) has been around for a number of years. During transmis- 
sion, an encoder generates a specific audio tone that modulates the 
radio frequency carrier. This tone is continuous for the duration of 
a message. In the absence of a CTCSS signal. the decoder at the re- 
ceiving end is deactuated. 

Another example is tone selective calling, such as EEA and 
ZVEL. In these methods. a sequence of five tones is used to form an 
address for a receiver. Both EEA and ZVEI have a total of 12 pos- 
sible tones. Each possible address consists of a set of five, which 
actuates the receiver from the point of the user. However, despite these 
examples, coding has remained almostexclusively digital and the cel- 
lular GSM standard actually prohibits the use of tones. 

On the secrecy side, there are voice scramblers that use [re- 
quency inversion, but increasingly the trend has been to digitize speech 
(for instance, ADPCM—Appendix 6) and to apply the techniques 
of Section 5. Coding in analogue signal processing is very restricted 
and need not be considered seriously. 
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Appendix |! 


The receiver re-calculates the check bits and validates them against 
the received values. 


c; and c> are not validated = dg is incorrect 

c, and cy are not validated = ds is incorrect 

cs and cy are not validated => dg is incorrect 

cy. ¢2 and cy are not validated = dz is incorrect 

The sum of the indices of the check bits indicate the location of 
the erroneous bit. The correction process replaces a “0° by a *1” or 
vice versa. The principal difficulty is that two errors can cause a 
correct bit to be changed. 


Appendix 2 


The Hamming code for five data bits ts dy eg d7 dg ds Cy d3 ca cy and 
requires four check bits with the same procedure as in the (7.4) 
code. r check bits can test up to 2’/—1 locations. For =3, this gives 
n=23-1, which leaves four data bits. For -=4, there is a block size 
n=15 which allows for |] data bits and four check bits in the order: 


dys dyy dyzdy2 diy diy dy cg dy dg ds cy dy cp cy. 


Appendix 3 


For a 63-bit block, the factors of the modulus are: 


